Sentiment analysis is attracting more and more attentions and has become avery hot research topic due to its potential applications in personalizedrecommendation, opinion mining, etc. Most of the existing methods are based oneither textual or visual data and can not achieve satisfactory results, as itis very hard to extract sufficient information from only one single modalitydata. Inspired by the observation that there exists strong semantic correlationbetween visual and textual data in social medias, we propose an end-to-end deepfusion convolutional neural network to jointly learn textual and visualsentiment representations from training examples. The two modality informationare fused together in a pooling layer and fed into fully-connected layers topredict the sentiment polarity. We evaluate the proposed approach on two widelyused data sets. Results show that our method achieves promising result comparedwith the state-of-the-art methods which clearly demonstrate its competency.
展开▼